COVID-19 mortality is associated with pre-existing impaired innate immunity in health conditions

COVID-19 can be life-threatening to individuals with chronic diseases. To prevent severe outcomes, it is critical that we comprehend pre-existing molecular abnormalities found in common health conditions that predispose patients to poor prognoses. In this study, we focused on 14 pre-existing health conditions for which increased hazard ratios of COVID-19 mortality have been documented. We hypothesized that dysregulated gene expression in these pre-existing health conditions were risk factors of COVID-19 related death, and the magnitude of dysregulation (measured by fold change) were correlated with the severity of COVID-19 outcome (measured by hazard ratio). To test this hypothesis, we analyzed transcriptomics data sets archived before the pandemic in which no sample had COVID-19. For a given pre-existing health condition, we identified differentially expressed genes by comparing individuals affected by this health condition with those unaffected. Among genes differentially expressed in multiple health conditions, the fold changes of 70 upregulated genes and 181 downregulated genes were correlated with hazard ratios of COVID-19 mortality. These pre-existing dysregulations were molecular risk factors of severe COVID-19 outcomes. These genes were enriched with endoplasmic reticulum and mitochondria function, proinflammatory reaction, interferon production, and programmed cell death that participate in viral replication and innate immune responses to viral infections. Our results suggest that impaired innate immunity in pre-existing health conditions is associated with increased hazard of COVID-19 mortality. The discovered molecular risk factors are potential prognostic biomarkers and targets for therapeutic intervention.


INTRODUCTION
COVID-19 was declared a global pandemic by the World Health Organization (WHO) as of March 11, 2020(World Health Organization, 2021a. The outbreak has seen over 264 million cases and over 5.2 million deaths as of December 2021, and these numbers are still rising (World Health Organization, 2021b). The clinical spectrum of illness ranges (Williamson et al., 2020). Molecular profiles of individuals affected by these conditions and those unaffected can be found in public repositories, such as the Gene Expression Omnibus (GEO) database (Barrett et al., 2013). Because these data were generated before the pandemic, all samples were COVID negative. It is widely acknowledged that patient gene expression profiles reflect the underlying pathological processes (Kurreck & Stein, 2015). While different diseases target different tissues and organs, transcriptomes of peripheral blood cells are informative about systematic changes of a person's overall health (Mesko et al., 2010;Aziz, Zaas & Ginsburg, 2007). Therefore, we chose to examine peripheral blood transcriptomes in this study. We identify transcriptional dysregulations observed in multiple pre-existing conditions and correlate their fold changes with HRs of COVID-19 mortality. These recurrent dysregulations correlated with HRs are molecular risk factors predisposing COVID-19 patients to severe outcomes. We further analyzed functional relationships of these risk genes, which converged onto impaired innate immunity as a systemic mechanism that weakens host defense against SARS-CoV-2 infection.
While we focused on gene expression risk factors in this study, our analytic approach is applicable to other omics-level profiles, such as epigenetic and metabolomic data. Integration of these discoveries will allow for better prediction of severe outcomes of COVID-19 and inform the development of preventative measures to reduce fatality. Furthermore, the long-term sequelae of COVID-19 survivors are currently unknown. A greater apprehension of the disease mechanisms in the context of comorbidities will serve for future evaluation of the health impact of COVID-19 on patients with chronic diseases.

Data sets
The OpenSAFELY project reported age-sex-adjusted HRs of COVID-19 related deaths for 23 groups of pre-existing health conditions. We excluded the two cancer groups (solid tumors and hematological malignancies) due to the extremely high heterogeneity of cancers (Chandrashekar et al., 2020). For each remaining health condition, we searched the GEO database (Barrett et al., 2013) to identify transcriptomics studies involving affected individuals (cases) and unaffected individuals (controls, Fig. 1A). We limited our queries to peripheral blood samples as a modus operandi of removing confounders related to different tissue types and encapsulating disease characteristics at a systemic level. We further limited our query to microarray-based transcriptomic profiles to reduce technical variance. If multiple data sets were available for a health condition, we chose the one with the largest sample size. We downloaded the normalized gene expression values.

Identify dysregulated gene expression in pre-existing health conditions
Given a transcriptomic data set, we used the Student's t-test to compare expression levels of each gene in cases vs controls (Fig. 1B). Specifically, we used the t.test() function in R and set the following parameters (alternative="two.sided", paired=F). Aiming to be inclusive at this step, we considered genes with nominal P value <0.05 to be differentially expressed. If multiple probes on the microarray represented the same gene, we kept the one with the lowest P value and removed the other ones to avoid redundancy. For a differentially expressed gene (DEG), we computed the fold change (FC) as the ratio of the mean expression level in cases over controls. For a non-DEG, we set the FC to one. If a gene was differentially expressed in at least four health conditions, it was designated a recurrent DEG (rDEG).

Identify molecular risk factors of COVID-19 mortality
The gene expression data and the COVID-19 data were derived from different cohorts. To perform cross-cohort analysis, we linked FCs of gene expression and HRs of COVID-19 mortality at the health condition level. For a given rDEG, we tested if its FCs in pre-existing conditions were correlated with the corresponding HRs using the Pearson correlation test (Fig. 1C). Specifically, we used the cor.test() function in R and set the following parameters (method="pearson", use="pairwise.complete.obs"). We corrected for multiple comparisons by converting nominal P values to false discovery rates (FDRs) using the Benjamini-Hochberg method, as implemented in the p.adjust() function in R. FDR < 0.05 indicated significant molecular risk factors. FDR > 0.05 but a nominal P value <0.01 indicated marginal risk factors. The positive or negative sign of a Pearson correlation coefficient (PCC) indicated that upregulated gene expression or downregulated gene Compile microarray-based transcriptomic data sets for common health conditions. We included case-control studies using peripheral blood samples. (B) Find differentially expressed genes (DEGs) for each health condition. Recurrent DEGs were those differentially expressed in at least four health conditions. (C) Perform correlation tests to identify molecular risk factors, which are pre-existing expression dysregulations that increase the risk of COVID-19 related death. (D) Examine functional relationships of molecular risk factors via enrichment and network analyses. (E) Validate molecular risk genes using published data from an in vitro SARS-CoV-2 infection study (Chiu, Macmillan & Chen, 2009).

Functional categorization and analysis
We classified molecular risk factors into overlapping gene sets based on annotations of biological processes in the Gene Ontology (GO) database and pathways in the KEGG database. For each gene set, we tested if it was overrepresented using the Fisher's exact test. We corrected for multiple comparisons by converting nominal P values to FDRs. We built association networks of enriched gene sets (FDR < 0.05) and examined their relationships using the cnetplot() function from the R/enrichplot package (Yu et al., 2012) (Fig. 1D). To assess the power of the association network, we performed hierarchical clustering of the gene sets. Specifically, we constructed an input matrix with genes as rows and GO terms as columns. If a gene is annotated with a GO term, the corresponding value in this matrix is set to 1. Otherwise, it is set to 0. We then performed hierarchical clustering and computed the Goodman-Kruskal-gamma index (Baker & Hubert, 1975) for various numbers of clusters using the NbCluster method (distance="binary", index="gamma", method="ward.D2", max.nc=9) (Charrad et al., 2014).

Experimental validations
To validate if the identified molecular risk factors at the gene level reflected the molecular changes at the protein level observed in SARS-CoV-2 infections, we used the published proteomic data set from an in vitro infection experiment (Bojkova et al., 2020). In this experiment, human Caco-2 cells were infected with SARS-CoV-2 viruses. Infected cells and mock controls, each with three replicates, were cultured for 24 h. Quantifications of 6,381 proteins were obtained and compared between the infected cells and the mock control cells with two-group t tests. We obtained the normalized protein quantification data and the pre-computed t-test P values from the Table S2 in the published study (Bojkova et al., 2020). Given a risk factor gene, we searched the proteomics data set for the matching protein based on the gene symbol. To validate a risk factor gene, we required that its transcriptional dysregulation pattern was consistent with the dysregulated protein expression pattern (Fig. 1E). Specifically, if upregulation of the risk factor gene was associated with an increased COVID-19 mortality rate, the matching protein shall also be upregulated in SARS-CoV-2 infected cells as compared to mock control cells. If downregulation of the risk factor gene was associated with an increased COVID-19 mortality rate, we required the matching protein also to be downregulated in SARS-CoV-2 infected cells as compared to mock control cells. R source codes used in this study are available at https://github.com/liliulab/COVID19_ Mortality_Association/.

DEGs in pre-existing health conditions
Our search of the GEO database found qualified transcriptomic data for 14 health conditions. For each health condition, we identified DEGs with Student t-test P < 0.05. This lenient cutoff allowed us to be as inclusive as possible at this step. On average, each health condition was associated with 5,777 DEGs (range 1,215 to 17,688). Most of the DEGs were downregulated in cases as compared to controls (mean FCs range from 0.003 to 0.160). Among a total of 25,552 genes analyzed, we found 11,930 rDEGs, that is, those that were differentially expressed in at least four health conditions. Table 1 presents the summary statistics of DEGs.

Pre-existing expression dysregulations increase COVID-19 death risks
For each rDEG, we tested if its FCs in different health conditions were correlated with HRs for COVID-19 mortality. For health conditions where this gene was not differentially expressed, we set the FCs to 1 and included them in the correlation test as well. Among a total of 11,930 DEGs, we found no significant molecular risk factor that passed the stringent FDR < 0.05 threshold. However, 231 genes passed the Pearson correlation test P < 0.01 threshold and were considered as marginal molecular risk factors. Among them, upregulated expression of 70 genes and downregulated expression of 181 genes increased risk of COVID-19 related death (Table S1).
The RPS28 gene had the most significant correlation P value (0.0003). Its FCs in pre-existing conditions were positively correlated with HRs of COVID-19 mortality (PCC = 0.83, Fig. 2A). RPS28 encodes a component of the 40S subunit of the ribosome where a cell synthesizes proteins. It was differentially expressed in six health conditions, including rheumatoid arthritis, chronic obstructive pulmonary disease, alcoholic hepatitis, multiple sclerosis, HIV, and chronic kidney disease. As its FC increased from 0.98 to 1.14, the HR of COVID-19 mortality increased from 1.30 to 3.48. Furthermore, the list of molecular risk factors contained seven additional genes that encode ribosomal components (RPLP1, RPLP2, RPL13, RPL23A, RPL30, RPL38, and RPS11). Except for RPL30, upregulation of these genes consistently increased the HR of COVID-19 mortality (P range = 0.001 to 0.009, PCC range = 0.71 to 0.78, Fig. 2B). This is in accordance with the positive viral infection-regulating roles of ribosomal proteins (Li, 2019). Notably, three ribosomal proteins in our list are required for early virus accumulation (Campos et al., 2016), though mechanistic studies in SARS-CoV-2 are still lacking.
The POLR3GL gene showed the most significant negative correlation (P = 0.0008, PCC = -0.81, Fig. 2C). POLR3GL encodes a subunit of RNA polymerase III that catalyzes the transcription of DNA into RNA. It induces production of interferon (IFN-a/β) to inhibit virus replication (Chiu, Macmillan & Chen, 2009;Samuel, 2001). Consistent with this function, pre-existing downregulation of POLR3GL in eight health conditions (Alzheimer's disease, chronic kidney disease, alcoholic hepatitis, chronic obstructive pulmonary disease, HIV, multiple sclerosis, rheumatoid arthritis, and type-2 diabetes) increased the risk of COVID-19 related death. Furthermore, the list of risk factors contained two additional genes, LSM14A and IFRD1, that regulate interferon signaling. For these genes, downregulation increased HRs of COVID-19 mortality (P = 0.003 and 0.005, PCC = −0.74 and −0.71, respectively, Fig. 2D). Interestingly, we did not find interferons as molecular risk factors, presumably due to their transient expression profiles.

Functional groups enriched with risk factors
We classified the list of marginal molecular risk factors into functional gene sets based on Gene Ontology and KEGG annotations and then performed enrichment analysis. At FDR < 0.05, these molecular risk factors were significantly enriched in 10 biological processes and three pathways ( Table 2). Most of these gene sets were related to viral transcription, mRNA processing and metabolism, protein synthesis, and endoplasmic reticulum (ER) function. We then built an association network to examine functional relationships of the enriched gene sets (Fig. 3A). Based on hierarchical clustering analysis and Goodman-Kruskal-gamma index, genes in this association network formed seven clusters (gamma = 1.0, Fig. S1A). Merging these clusters via GO terms produced three supersets (gamma = 0.84, Fig. S1B), each composed of interconnected gene clusters. Genes bridging these sets indicate crosstalk between supersets. The first superset consisted of gene clusters involved in viral transcription, translational initiation, mRNA catabolic process, and proteins targeting ER. It included eight common risk factor genes encoding ribosomal components that function in ER. Except RPL30, upregulation of all these genes correlated with increased HRs of COVID-19 mortality. Conversely, for most of the other genes (7 out of 9) in this superset, downregulation correlated increased HRs of COVID-19 mortality, including two genes (SEC62 and SGTB) that target ER and participate in degradation of misfolded proteins. Therefore, pre-existing abnormal functions in ER, specifically upregulated protein synthesis and downregulated degradation of unfolded protein, were associated with high risk of COVID-19 death. The second superset consisted of a single gene cluster involved in RNA splicing. For almost all genes (11 out of 14) in this set, downregulation was associated with increased HRs of COVID-19 mortality. In COVID-19 patients, host RNA splicing was significantly disrupted by SARS-CoV-2 . Our observation suggests that pre-existing downregulation of RNA splicing genes can potentially aggravate such disruptions.
The third superset consisted of a single gene cluster participating in the interleukin-12mediated signaling pathway. Noticeably, one of the risk factor genes in this set, PPIA, has been shown to act as a potential mediator between human SARS coronavirus nucleoprotein and BSG/CD147 during the process of invasion of host cells by the virus (Chen et al., 2005). Consistent with this previous study, we found that pre-existing upregulation of this gene increased COVID-19 mortality risk in nine common health conditions (Fig. 3B). Interestingly, interleukin-12, a pro-inflammatory cytokine (IL12A and IL12B genes) was not a molecular risk factor. It was dysregulated in two health conditions but did not meet the criterion of being a rDEG, which required dysregulation in at least four health conditions. Its receptor IL12RB1 was a near miss, showing a positive correlation between FC and HR in five health conditions (P = 0.079, PCC = 0.48).
Crosstalk between the second and third supersets is via HNRNPA2B1, which binds heterogeneous nuclear RNA (hnRNA) and subsequently induces IFN-a/β production to inhibit virus replication.
Two other hnRNA binding proteins, HNRNPH3 and HNRNPDL, were also molecular risk factors. For all three genes, pre-existing downregulation increased COVID-19 mortality risk, presumably by blocking IFN-a/β production, which compromises innate immunity (Fig. 3C).

Dysregulated cell death and mitochondrial functions
The innate immune system is intrinsically connected with programmed cell death (Birge & Ucker, 2008) and mitochondrial functions (West, Shadel & Ghosh, 2011). In accordance with this, our list of molecular risk factors contained eight genes involved in apoptosis, seven genes involved in autophagy, and 20 genes involved in mitochondrial function (Table S2). Although these gene sets did not pass the stringent threshold of FDR < 0.05 in the enrichment analysis, they were overrepresented in several biological processes with borderline nominal P values, including "mitochondrial RNA metabolism" (P = 0.001), "regulation of apoptotic signaling pathway" (P = 0.05), "mitochondrial organization" (P = 0.04), "autophagy of mitochondrion" (P = 0.08), and "autophagosome assembly" (P = 0.08).
To examine how these dysregulated processes correlated to COVID-19 mortality, we built another association network (Fig. 4A). Autophagy and apoptosis are two mechanisms of programmed cell death that inhibit each other (Thorburn, 2008). Our association network contains six genes that co-regulate these processes. In normal conditions, these genes keep autophagy and apoptosis in balance. Specifically, BNIP3, FBXW7, RAB7A, STX17, and UBQLN2 induce autophagy and suppress apoptosis, which is counteracted by FLCN (Khandia et al., 2019;Possik et al., 2014). Pre-existing dysregulations of these genes converged to a common pattern, i.e., suppressed autophagy and escalated apoptosis jointly increased the risk of COVID-19 related death (Fig. 4B). Furthermore, downregulation of BNIP3 and FBXW7 suggest compromised mitochondrial organization and reduced mitochondrial autophagy. As disrupted mitochondrial functions, suppressed autophagy, and escalated apoptosis are commonly found in COVID-19 patients (Ajaz et al., 2021;Gassen et al., 2021;Paolini et al., 2021), our results imply if such dysregulations are present prior to SARS-CoV-2 infection, patients are prone to poor prognosis.

Validation in SARS-CoV2 infected samples
We validated the identified risk factor genes using a published proteomics data set from an in vitro study (Bojkova et al., 2020). In this experiment, human Caco-2 cells were infected with SARS-CoV-2 viruses. Infected cells and mock controls, each with three replicates, were cultured for 24 h. Quantifications of 6,381 proteins were obtained. We crossreferenced these proteins with the molecular risk factors based on gene symbols and found 105 gene-protein pairs. For each pair, we first examined if the protein had significantly different expression levels in infected cells compared to mock control cells using the published pre-computed t-test P value <0.05 as the threshold. We then examined if the protein dysregulation pattern was consistent with the dysregulation patterns of the risk factor genes, which would exacerbate the pre-existing aberrations and lead to poor prognosis. We found that 11 proteins were significantly upregulated in the infected cells compared to mock control cells. Among them, three (RPS28, RPLP2, and PDE12) had matching risk factor genes showing consistent patterns, i.e., transcriptional upregulations increased the COVID-19 mortality risk (Fig. 5A). At 24 h after infection, the expression level of these proteins was significantly higher than that in mock controls by 9.5% to 12.2% (P values range 0.00027 to 0.019).
Similarly, we found that 11 proteins were significantly downregulated in the infected cells compared to mock control cells. Among them, six (ADK, DHFR, METAP2, PIN4, SC5D, and TMEM263) had matching risk factor genes showing consistent patterns, i.e., transcriptional downregulations increased the COVID-19 mortality risk (Fig. 5B). At 24 h after infection, the expression level of the corresponding protein was significantly lower than that in mock controls by 5.3% to 28.6% (P values range 0.0004 to 0.04).
In summary, we validated nine molecular risk factor genes by showing the concordance between transcriptional dysregulations caused by pre-existing medical conditions and the protein dysregulations caused by SARS-CoV-2 infection. The joint effects plausibly modulated the clinical outcomes of COVID-19 patients.

DISCUSSION
The drastically different disease progression and prognosis among COVID-19 patients with pre-existing health conditions challenge clinical management of this life-threatening disease. In this study, we integrated publicly available transcriptomics data of common health conditions and COVID-19 epidemiology data to study the molecular mechanisms underlying this complex problem. Our analyses revealed that pre-existing transcriptional dysregulations frequently observed in multiple health conditions increased risk of severe COVID-19 outcomes, plausibly via impairing host innate immunity, as discussed below.
Innate immunity is an integral part of the body's defense system which responds to invading pathogens, as well as damage caused by chronic health conditions (Horiguchi  , 2018). Abnormalities of many biological processes may alter innate immune responses, which in turn reduces host defense against SARS-CoV-2 infection (Schultze & Aschenbrenner, 2021). ER and mitochondria are crucial organelles that lie at the crossroad between host physiological functions and viral infection. On the one hand, various chronic disorders interfere with ER homeostasis and mitochondrial dynamics, giving rise to chronic inflammation that subsequently activates the body's innate immune system (Ozcan & Tabas, 2012;Bhatti, Bhatti & Reddy, 2017). On the other hand, invading viruses hijack ER for entry into host cells and assembly of viral genomes (Inoue & Tsai, 2013). Mitochondrial dynamics are subverted to benefit the intracellular survival of viruses. Emerging evidence suggests that coronavirus infection, including SARS and COVID-19, triggers ER stress and viral replication within mitochondrial structures (Chan et al., 2006;Zhang et al., 2020). If these organelles are already compromised prior to infection, further disruptions of their crucial functions by COVID-19 will likely lead to severe outcomes. Our study supports this hypothesis as we found genes targeting ER and mitochondria were enriched in the list of molecular risk factors. These genes include SEC62 and SGTB that degrade unfolded proteins in ER; UBE2J2, COPS5, MBTPS2, and PPIA that respond to unfolded proteins; and 20 genes that participate in various aspects of mitochondrial functions.
Innate immune responses to virus-infected cells include autophagy to isolate and clear infected viruses or apoptosis and necroptosis to eliminate infected cells. Viruses have evolved mechanisms to inhibit these surveillance processes so that infected cells would not be cleared efficiently, and the viruses could spread (Kvansakul, 2017). Suppressed autophagy and escalated apoptosis have been reported in severe COVID-19 cases (Gassen et al., 2021;Paolini et al., 2021). In this study, we discovered transcriptional dysregulations that predisposed patients to such distortion. Interestingly, two of these genes, FBXW7 and BNIP3, determine the cell fate via autophagy of mitochondria and apoptosis, implying the important role of mitochondrial functions in COVID-19 severity.
Both chronic inflammation in pre-existing health conditions and acute inflammation in pathogen infections regulate interferon production. Reduced antiviral interferon response has been associated with excessive proinflammatory responses in COVID-19 and emerges as a clinical determinant of COVID-19 severity (Blanco-Melo et al., 2020). Our results also imply that interferon production and signaling are suppressed in several health conditions via downregulation of POLR3GL, LSM14A, IFRD1, and three hnRNA binding proteins (HNRNPA2B1, HNRNPH3, and HNRNPDL). Dysregulation of these genes disrupts type I interferon signaling pathways. Furthermore, we found upregulation of PPIA and MTAP genes that activate IL12-mediated signal pathway in which proinflammatory cytokines IL12 mediate the innate immune response (Trinchieri, 2003). However, we did not find any cytokines, either proinflammatory or anti-inflammatory, to be molecular risk factors. Although several cytokines were dysregulated in the health conditions we examined, there was no uniform correlation with HR of COVID-19 mortality.
Limitations of our study include the lack of individual risk factors passing a stringent statistical threshold and no consideration of multivariate effects. Although our analysis identified marginal molecular risk factors passing the nominal P value cutoff, none had a significant FDR after correction for multiple comparisons, which disqualified them as prognostic markers. However, analysis using the aggregation of these risk factor genes discovered significantly enriched biological processes, with the best FDR < 10-4 (Table 2). Therefore, we are confident that chronic ER stress and immune dysregulation in pre-existing health conditions increased risk of COVID-19 mortality. Our analyses were based on univariate models, in which we examined the expression levels of each gene separately. Because multiple genes are dysregulated concurrently and a combination of them contributes to COVID-19 prognosis, a more realistic model should consider their combined effect. However, because the transcriptomics data were derived from individual patients and HRs of COVID-19 mortality were from summary statistics of an epidemiology study, we chose to use univariate models that are more straightforward to interpret. Lastly, our analyses focused on associating pre-existing transcriptional dysregulations to the ultimate outcome of COVID-19 assessed on mortality. However, dynamic changes of molecular and cellular processes are expected during the clinical course of COVID-19, which are potentially influenced by pre-existing conditions, as well.
Meta-analysis studies have emerged to associate pre-existing conditions with severe COVID-19 outcomes (Treskova-Schwarzbach et al., 2021;Rosario Ferreira et al., 2021). Our study adds to the current literature by developing a new analytical approach to integrating epidemiology data and omics data derived from disjoint cohorts and discovering novel molecular risk factors. While we focus on transcriptional regulation in this study, an immediate next step is to apply this approach to other molecular changes, including genetic variation, epigenetic modification, and metabolic perturbation to investigate their roles in COVID-19 pathogenesis. As before-infection samples of COVID-19 patients are difficult to acquire, integration of existing multi-omics data and epidemiology data hold promise to accelerate the discovery of diagnostic and therapeutic markers to improve the management of COVID-19 disease.

CONCLUSIONS
Our study illuminates that gene expression dysregulations in pre-existing health conditions that impair innate immunity are molecular risk factors of COVID-19 related death. The individual risk factor genes or gene sets are potential mediators in disease pathogenesis. These findings allow for better prediction of severe outcome, inform the development of preventative measures to reduce fatality, and inform the evaluation of long-term health impact of COVID-19 in different populations.